Google IT support — Networking


1. The TCP/IP network model

  1. Physical layer

    • represents the physical devices that interconnect computers
    • includes specifications for the networking cables and the connectors that join devices together
    • includes specifications describing how signals are sent over these connections
  2. Data link layer (or network interface, or network access layer)

    • responsible for defining a common way of interpreting signals between devices on the same network
    • responsible for transmitting data across a single link
    • most common protocols: Ethernet and Wi-Fi
  3. Network layer (or internet layer)

    • responsible for getting data delivered across a collection of networks, i.e., allows different networks to communicate with each other through routers
    • responsible for transmitting data between two individual nodes
    • internetwork is a collection of networks connected together through routers (e.g., the Internet)
    • most common protocol: IP (internet protocol)
    • network software: client (application initiating a request for data) and server (software answering the request across the network)
  4. Transport layer

    • responsible for ensuring that data gets to the right client and server programs
    • most common protocols: TCP (transmission control protocol) and UDP (user datagram protocol)
    • TCP provides mechanisms to ensure that data is reliably delivered, and UDP does not
  5. Application layer:

    • a lot of application specific protocols

Analogy: the physical layer is the delivery truck and the roads; the data link layer is how the delivery trucks get from one intersection to the next; the network layer identifies which roads need to be taken to get from address A to address B; the transport layer ensures that delivery driver knows how to knock on your door to tell you your package has arrived; the application layer is the contents of the package itself.


2. Network devices

2.1. Cables

Copper cables:

Fiber (or fiber optic cables):

2.2. Hubs and switches

Used to connect devices on the same network (LAN, or local area network).

Hub is a physical layer device that allows for connections from many computers at once.

Network switch is a data link layer device that allows for connection from many computers at once.

2.3. Routers

Used to connect devices on different networks.

Router is a network layer device that allows to forward data between independent networks.

2.4. Servers and clients

Client is something that requests data.

Server is something that provides data to a client.


3. The physical layer

3.1. Binary across the wire

The physical layer consists of devices and means of transmitting bits across computer networks. A standard copper network cable will carry a constant electrical charge and binary is encoded through modulation, or line coding.

3.2. Ethernet over twisted pair

The most common connection type used in computer networking is known as twisted pair (pairs of copper wires that are twisted together). These pairs act as a single conduit for information, and their twisted nature protects signal against electromagnetic interference and crosstalk from neighboring pairs.

Cat6 cable has eight wires consisting of four twisted pairs inside a single jacket. Exactly how many pairs are in use depends on the transmission technology being used.

Simplex communication: information can flow only unidirectionally across the cable.

Duplex communication: information can flow in both directions across the cable (e.g., phone call).

To ensure duplex communication networking cables reserve one or two pairs for communicating in one direction, and the other one or two pairs in another direction.

Ethernet over twisted pair technologies are the communication protocols that determine the volume of data that can be transfered, transfer rate and the distance at which quality of this data begins to degrade.

3.3. Network ports

Twisted pair network cables are terminated with a plug that takes the individual internal wires and exposes them. The most common plug: RJ-45 (registered jack 45).

A network cable with an RJ-45 plug can connect to an RJ-45 network port. Network ports are generally directly attached to the devices. Switches have many network ports (because their purpose is to connect many device), while servers and desktops usually only have one or two.

Most network ports have two LEDs: link light and activity light. The link light is lit when devices are connected and powered on, and the activity light is lit when data is actively transmitted. On switches, sometimes the same LED is used for both link and activity status (it also might indicate other things like link speed).

3.4. Patch panels

Sometimes a network port isn't connected directly to a device, but instead, there might be network ports mounted in a wall or underneath a desk. These ports are generally connected to the network via cables ran through the walls that eventually end at a patch panel (device that contains many net ports but does no other work).

Patch panel is just a container for the endpoints of many runs of cable. Additional cables are then generally ran from a patch panel to switches or routers to provide a network access.


4. The data link layer

4.1. Ethernet

Wireless and cellular internet access are becoming some of the most common ways to connect to networks, but traditional cable networks are still the most common option. Ethernet is the most widely used protocol to send data across individual links.

4.2. MAC address

MAC address is globally unique identifier attached to an individual network interface.

A MAC address is split into two sections:

  1. The first three octets: OUI (organizationally unique identifier), assigned to hardware manufacturers by IEEE (Institute of Electrical and Electronics Engineers).
  2. The last three octets: assigned by manufacturers by their own considerations with the condition that each address should be unique.

4.3. Unicast, multicast, and broadcast

Unicast: transmission is meant just for one address.

Multicast: transmission is meant for several addresses.

Unicast and multicast frames are sent to all devices on the collision domain. The difference: unicast frame is only received and processed by the intended destination, while multicast frame will be accepted or discarded by devices depending on criteria aside from their MAC addresses (network interfaces can be configured to accept lists of multicast addresses).

Broadcast: transmission is meant for every device on a LAN.

Accomplished by using a special destination known as a broadcast address (FF:FF:FF:FF:FF:FF). Ethernet broadcasts are used so that devices can learn more about each other.

4.4. Ethernet frames

Data in computer networks is sent by packets. The term data packet isn't tied to any specific layer or technology, it just represents a concept. At Ethernet level data packets are called Ethernet frames.

  1. Preamble (8 bytes)

    • first section (7 bytes): acts as a buffer between frames and can also be used to synchronize internal clocks to regulate the speed of data transfering
    • second section (1 byte): SFD (start frame delimiter), signals that the preamble is over
  2. Destination and source MAC addresses (12 bytes)

  3. VLAN tag (4 bytes; optional)

    • VLAN (virtual LAN) header indicates that it's VLAN frame
    • VLAN lets have multiple logical LANs on the same physical equipment; it's usually used to segregate different forms of traffic on one network
    • any VLAN frame will only be delivered out of a switch interface configured to relay that specific tag
  4. EtherType (2 bytes)

    • used to describe the protocol of the contents of the frame (i.e., IP, ARP, etc.)
  5. Payload (46-1500 bytes)

    • actual data being transported (contains all of the data from higher layers)
  6. FCS (frame check sequence) (4 bytes)

    • represents a checksum value for the entire frame
    • checksum value is calculated by performing a CRC (cyclical redundancy check) against the frame to ensure data integrity (CRC is a mathematical transformation that uses polynomial division to create a number that represents a larger set of data)
    • if the checksum computed by the receiver doesn't match with FCS, the data is thrown out; then it's up to a protocol at a higher layer to decide if that data should be retransmitted, Ethernet itself only reports on data integrity and doesn't perform data recovery

5. The network layer

Physical layer is responsible for transfering data over short distances on a single segment of LAN. Network layer allows to transfer data over greater distances across many networks.

The MAC addressing scheme works well on a LAN (because switches can quickly learn about MAC addresses in use), but it fails to scale well. Since MAC addresses are unique and not ordered in any systematic way, there is no way of knowing where on the planet a certain MAC address might be. Solution for this problem is found in the network layer and IP (internet protocol).

5.1. IP address

IP address is a number assigned to each device connected to a computer network.

Dynamic IP addressStatic IP address
automatically assigned by a network (usually reserved for clients)manually configured on a node (usually reserved for servers and network devices)

5.2. IP datagrams and encapsulation

Under the IP protocol packets are called IP datagrams. Each IP datagram consists of two parts: header and payload.

  1. Version of IP (4 bits)

    • IPv4 or IPv6
  2. Header length (4 bits)

    • minimum length of the header: 20 bytes (IPv4)
  3. Service type (8 bits)

    • used to specify details about QoS (quality of service) technologies (there are services that allow routers to create priority lists for IP datagrams)
  4. Total length (16 bits)

    • maximum length of the IP datagram: 216 = 65635
  5. Identification (16 bits)

    • if the data doesn't fit in a single IP datagram then IP splits it in pieces; identification is a number that's used to group messages together
    • packets with the same identification field are parts of the same transmission
  6. Flags (3 bits)

    • used to indicate if datagram is allowed to be fragmented or if it has already been fragmented (fragmentation is the process of splitting an IP datagram into several smaller datagrams)
    • most networks operate with similar settings regarding allowed datagram sizes, but sometimes this could be configured differently
    • if a datagram has to cross from a network allowing a larger datagram size to one with a smaller size, it would have to be fragmented
  7. Fragmentation offset (13 bits)

    • used to put fragmented datagrams in the correct order
  8. TTL (time to live) (8 bits)

    • number of router hops a datagram can traverse before it's thrown away (every time a datagram reaches a new router, its TTL is decreased by one; if it's zero, then router doesn't forward the datagram any further)
    • purpose of TTL is to make sure there's no endless loops even if something isn't configured correctly
  9. Protocol (8 bits)

    • what transport layer protocol is being used (e.g., TCP or UDP)
  10. Header checksum (16 bits)

    • checksum of the contents of the IP datagram header
    • since TTL has to be recomputed at every router that a datagram touches, the checksum will be changed too
  11. Source and destination IP addresses (64 bits)

  12. IP options (optional)

    • used to set special characteristics for datagrams (used for testing purposes)
  13. Padding

    • since the IP options field is optional and variable in length, the padding field is just a series of zeros used to ensure correct header size
  14. Payload

The entire contents of an IP datagram are encapsulated as the payload of an Ethernet frame. At the same time the payload of the IP datagram contains datagram from the transport layer, and so on. This process is known as encapsulation.

5.3. IP address classes

IP addresses can be split into two sections: the network ID and the host ID. There are three primary types of address classes: A, B, and C (1:3, 2:2, 3:1).

5.4. Address resolution protocol

ARP is a protocol that's used to discover MAC addresses of nodes from IP addresses.

ARP table is a list of associated IP and MAC addresses.

Once an IP datagram has been formed, it needs to be encapsulated inside an Ethernet frame. To do this the transmitting device needs to know a destination MAC address.

  1. Destination MAC address is first searched in the local ARP table.
  2. If there is no such entry, then the node sends a broadcast ARP message (FF:FF:FF:FF:FF:FF), which is delivered to all computers on the LAN.
  3. When the wanted node receives ARP broadcast, it sends back an ARP response with the MAC address in question.
  4. Then the transmitting device will receive it and store it in the local ARP table (ARP table entries generally expire after a short amount of time to ensure changes in the network are accounted for).

5.5. Subnetting

Subnetting is process of splitting a large network into many subnets (which will have their own gateway routers serving as the ingress and egress points).

Subnet mask is 32-bit number (4 octets) of the following form: 1...10...0.

Subnetting is implemented using subnet masks. They add subnet ID to the IP address and extend what's possible with just network IDs and host IDs (and CIDR allows even more flexibility).

The size of a subnet is entirely defined by its subnet mask (in general, a subnet can usually only contain two less than the total number of host IDs available: 0 is generally not used and 255 is normally reserved as a broadcast address).

For convenience subnet masks are sometimes abbreviated in the following way (CIDR notation):

5.6. CIDR

Address classes and traditional subnetting weren't the most efficient way of organizing IP addresses. The sizing of networks was inpractical: 254 hosts for class C networks, but 65534 hosts for class B, and no option in between (so many companies ended up adjoining several class C networks together).

CIDR (classless inter-domain routing) is a more flexible approach to this problem. With CIDR, the network ID and subnet ID are combined into one, so CIDR abandons the concept of address classes entirely.

CIDR:

5.7. Routing

Router is a network device that forwards traffic depending on the destination address of that traffic.

Basic routing:

  1. Router receives a data packet.
  2. Strips away the data link layer encapsulation and examines the destination IP address.
  3. Looks up the network of the destination IP address in the routing table.
  4. Forms a new packet: copies original IP datagram, decrements TTL and recalculates a checksum.
  5. Encapsulates this new IP datagram inside of a new Ethernet frame with its own MAC address in the source MAC address field.
  6. Sends traffic forward.

5.8. Routing tables

Routing tables can vary a lot, but the most basic one will have four columns:

5.9. Routing protocols

Routing tables are always updated with new information about the quickest paths to destination networks. This is done with the help of routing protocols: interior gateway protocols and exterior gateway protocol.

  1. Interior gateway protocols

    Used by routers to share information within a single autonomous system (collection of networks under the control of a single network operator).

    1. Distance vector protocols

      • a router takes its routing table and sends it to every neighboring router
      • routers don't know much about the total state of an autonomous system (only about their immediate neighbors), so they might be slow to react to a change in the network far away
      • mostly outdated
      • most common: RIP (routing information protocol), and EIGRP (enhanced interior gateway routing protocol)
    2. Link state protocols

      • more sophisticated approach for determining the best paths (each router advertises the state of the link of each of its interfaces)
      • every router on the system knows every detail about every other router and uses this data to run complicated algorithms to determine best paths
      • require more memory and processing power
      • most common: OSPF (open shortest path first)
  2. Exterior gateway protocol

    Used to communicate data between routers representing the edges of autonomous systems. Each autonomous system has a 32-bit numbers assigned to it called ASN (autonomous system number) (normally referred to as a single decimal number). These numbers just as IP addresses are allocated by the IANA (internet assigned numbers authority).

    There is only one exterior gateway protocol in use today: BGP (border gateway protocol).

5.10. Non-routable address spaces

From the early days of the Internet it was clear that the available number of IPv4 addresses is too small, so in 1996 RFC1918 was published (request for comments). It outlined a number of networks that would be defined as non-routable address space: ranges of IPs set aside for use by anyone that cannot be routed to.


6. The transport layer

The transport layer allows traffic to be directed to specific network applications and the application layer allows these applications to communicate with each other. The transport layer is responsible for multiplexing and demultiplexing traffic, establishing long running connections and ensuring data integrity through error checking and data verification.

6.1. Multiplexing and demultiplexing

Multiplexing: nodes on the network have the ability to direct traffic toward many different receiving services.

Demultiplexing: taking traffic that's all aimed at the same node and delivering it to the proper receiving service.

6.2. TCP segment

Just like how an Ethernet frame encapsulates an IP datagram, an IP datagram encapsulates a TCP segment (TCP header + data section for application layer).

  1. Source and destination ports (32 bits)

    • source port: high numbered port chosen from a special section of ports known as ephemeral ports (a source port is required to keep outgoing connections separate)
    • destination port: port of the service the traffic is intended for
  2. Sequence number (32 bits)

    • gives the position of the current TCP segment in a sequence of segments it belongs
    • Ethernet frame is limited in size to 1518 bytes, but we usually need to send way more data than that, so at the transport layer TCP splits it up into sequence of segments
    • while TCP will generally send all segments in sequential order, they may not always arrive in that order
  3. Acknowledgment number (32 bits)

    • the number of the next expected segment
  4. Header length (4 bits)

  5. Control flags (6 bits)

    • six TCP control flags for establishing and closing TCP connections
  6. Window (16 bits)

    • specifies the range of sequence numbers that might be sent before an acknowledgement is required
    • this is done in order to make sure that all expected data is actually being received and that the sending device doesn't waste time sending data that isn't being received
  7. Checksum (16 bits)

  8. Urgent (16 bits)

    • rarely used
    • pointer which is used in conjunction with one of the TCP control flags to point out particular segments that might be more important than others
  9. Options (optional)

    • rarely used
  10. Padding

    • sequence of zeros to ensure that the payload section begins at the expected location
  11. Payload

6.3. TCP control flags

TCP establishes connections through the use of TCP control flags (6 bits):

6.4. Establishing and closing TCP connections

  1. Three-way handshake (establishing the connection):

    1. Computer A sends a TCP segment to computer B with SYN flag set.
    2. Computer B responds with both the SYN and ACK flags set.
    3. Computer A responds again with just the ACK flag set.
    4. Now computer A is free to send whatever data it wants to computer B and vice versa (full duplex).

  2. Four-way handshake (closing the connection):

    1. Computer B sends a TCP segment with a FIN flag to computer A.
    2. Computer A responds with an ACK flag.
    3. Computer A sends FIN flag when ready.
    4. Computer B responds with an ACK flag.

6.5. TCP socket states

TCP socket is an endpoint instance of a specific TCP connection or listening state defined by an IP address and a port.

TCP sockets can exist in lots of states:

There are other socket states that exist. Additionally, their names can vary depending on an OS (they exist outside of the scope of the definition of TCP itself).

6.6. Connection-oriented and connectionless protocols

Connection-oriented protocol is a protocol that establishes a connection and uses it to ensure that all data is properly transmitted with the help of acknowledgments (e.g., TCP).

There's a lot of extra traffic with connection-oriented protocols: establishing connections, sending a constant streams of acknowledgements, tearing the connection down at the end. But sometimes you don't need to know that every packet you send reaches its destination (e.g., when streaming video).

Connectionless protocol is a protocol that doesn't rely on acknowledgements and establishment of connections (e.g., UDP, or user datagram protocol).

6.7. System and ephemeral ports

The range that ports can occupy (0-65535) is split into independent sections:

6.8. Firewalls

Firewall is a device that blocks traffic that meets certain criteria.


7. The application layer

There are a lot of protocols used at the application layer, but many of them are standardized across application types (e.g., web servers don't care what browser do you use as long as they use the same protocol).

7.1. OSI model

OSI model is the most rigorously defined model, it has seven layers, where the application layer is divided into three parts:

7.2. Networking in details

  1. User at computer 1 opens up a web browser and enters 172.16.1.100 into the address bar. The web browser communicates with the local networking stack (part of the OS), explains that it wants to establish a TCP connection with 172.16.1.100:80. The networking stack will now examine its own subnet: it sees that 172.16.1.100 lives on another network, so data has to be sent to gateway router at 10.1.1.1.

  2. Computer 1 looks at its ARP table to determine the MAC address of 10.1.1.1, but it doesn't find it. So it sends an ARP request for that IP address, which is broadcasted to FF:FF:FF:FF:FF:FF (to every node on the LAN).

  3. Router A receives this ARP message and responds to computer 1 with its MAC address of 00:11:22:33:44:55. Computer 1 receives this response and now knows the hardware address of its gateway. It's ready to start constructing the outbound packet.

  4. Computer 1 asked by the web browser to form an outbound TCP connection, so the OS identifies the available ephemeral port of 50000 and opens a socket connecting the web browser to this port.

  5. Web browser needs to establish a TCP connection, so the networking stack starts to build a TCP segment. It fills in all the appropriate fields in the header: a source port of 50000, a destination port of 80, a sequence number, the SYN flag, and a checksum for the segment.

  6. TCP segment is now passed along to the IP layer of the networking stack. This layer constructs an IP header: the source IP, the destination IP, a TTL of 64 (standard value), etc. Next, the TCP segment is inserted as the data payload for the IP datagram, and a checksum is calculated.

  7. Now an Ethernet frame is constructed. All the relevant fields are filled in: the source and destination MAC addresses, etc. Finally, the IP datagram is inserted as the data payload and another checksum is calculated. Now the Ethernet frame is ready to be sent across the physical layer.

  8. The network interface connected to computer 1 sends this binary data as modulations of the voltage of an electrical current running across a Cat6 cable that's connected between it and a network switch. This switch receives the frame, inspects the destination MAC address, and forwards the frame to the destination.

  9. Router A receives the frame, calculates a checksum and compares it to the appropriate field in the header of the Ethernet frame.

  10. Router A strips away the Ethernet frame and performs a checksum calculation on IP datagram. If all is correct router A inspects the destination IP address and performs a lookup of this destination in the routing table. Router A sees that the quickest path to destination is one hop away through Router B, which has an IP of 192.168.1.1. Next, router A makes a new IP datagram: takes old payload section, decrements the TTL by 1 and calculates a new checksum.

  11. Next router A looks in its ARP table for 192.168.1.1 to get router B's MAC address. When found router A constructs an Ethernet frame with the MAC address of its interface on network B as the source and the MAC address of router B's interface on network B as the destination. Once the values for all fields in this frame have been filled out, router A places the newly constructed IP datagram into the data payload field, calculates a checksum, and places it into the frame header.

  12. The frame makes it across network B, and is received by router B, where all the same checks are performed. Next, router B removes the the Ethernet frame encapsulation, and performs a checksum against the IP datagram. It then examines the destination IP address, looks at its routing table and sees that the computer 2 is on LAN (172.16.1.100). So it decrements the TTL by 1 again, calculates a new checksum, and creates a new IP datagram. This new IP datagram is again encapsulated by a new Ethernet frame, with the source and destination MAC address of router B and computer 2. And the whole process is repeated one last time.

  13. The frame makes it across the network C to the computer 2 (a switch ensures it gets to the destination). Computer 2 strips away the Ethernet frame, performs a CRC and recognizes that the data has been delivered intact. It then examines the destination IP address and recognizes that as its own. Next, computer 2 strips away the IP datagram and examines the checksum for TCP segment. Then the destination port is examined, the networking stack on computer 2 ensures that there's an open socket on port 80: it's in the LISTEN state and held open by a running Apache web server. Computer 2 then sees that this packet has the SYN flag set, so it examines the sequence number and stores it, since it'll need to put it in the acknowledgement field once it crafts the response.

So a single TCP segment containing a SYN flag has been delivered. Next computer 2 needs to send a SYN-ACK response to computer 1, which then needs to be acknowledged by computer 1.


8. Networking services

8.1. DNS and name resolution

DNS (domain name system) is a global and highly distributed network service that resolves strings of letters into IP addresses.

Name resolution is process of using DNS to turn a domain name into an IP address.

There are five primary types of DNS servers (one DNS server can fulfill many of these roles at once):

  1. Caching name servers

    • provided by an ISP or LAN
    • store domain name lookups for a certain amount of time, so full name resolution doesn't need to happen every single time (local computers will have DNS cache too)
    • most caching name servers are also recursive name servers
  2. Recursive name servers

    • perform full DNS resolution requests
    • domain names have a TTL (time to live), which is configured by the owner of a domain name (usually minutes or hours)
  3. Root name servers

    • respond to caching and recursive name servers with which TLD name server to contact
    • there are 13 total root name servers (13 "authorities" rather than physical servers) distributed across the globe via anycast (technique for routing traffic to different destinations depending on factors like location, congestion, or link health)
    • direct queries toward the appropriate TLD name server
  4. TLD name servers (top level domain)

    • respond to caching and recursive name servers with which authoritative name server to contact
    • the top of the hierarchical DNS name resolution system (the last part of any domain name)
    • for each TLD there is a TLD name server (but just like for root servers, that doesn't mean there's one physical server for each TLD)
  5. Authoritative name servers.

    • respond to caching and recursive name servers with the IP of the server in question
    • responsible for the last two parts of any domain name (resolution at which a single organization may be responsible for DNS lookups)

This complicated hierarchical system for DNS resolutions controlled by trusted entities exists to protect users and to ensure that their traffic isn't being redirected by malicious parties.

8.2. DNS and transport layer protocols

DNS resolutions can generate a lot of traffic, so it's better to use connectionless protocols for this (like UDP).

If DNS resolver doesn't get a response via UDP it just asks again, i.e., error recovery functionality of TCP at the transport layer is provided by DNS at the application layer.

DNS over TCP is used when DNS lookup response can't fit in a single UDP datagram, in this case a name server would respond with a packet explaining that the response is too large and a TCP connection needs to be established.

8.3. Resource record types

DNS in practice operates with a set of defined resource record types, which allow for different kinds of DNS resolutions to take place. The most basic ones are:

There are lots of other DNS resource record types in common use like the NS records or SOA records which are used to define authority information about DNS zones.

8.4. Anatomy of a domain name

Any domain name has three primary parts: TLD, domain and subdomain. When you combine all of them together, you get FQDN (fully qualified domain name).

  1. TLD (top level domain)

    • there are limited number of TLDs available (.com, .net, .edu, country specific TLDs, .museum, .pizza, etc.)
    • administration and definition of TLDs is handled by ICANN (Internet Corporation for Assigned Names); together with IANA they help define and control both the global IP space, and global DNS system
  2. Domain

    • used to demarcate where control moves from a TLD name server to an authoritative name server
    • typically under the control of an independent organization outside of ICANN; can be registered and chosen by any individual or company
    • it costs money to officially register a domain with a registrar (a company that has an agreement with ICANN to sell unregistered domain names)
  3. Subdomain

    • sometimes referred to as a host name (if it's been assigned to only one host)
    • freely chosen and assigned by anyone who controls a registered domain
    • DNS supports up to 127 levels for FQDN
    • each individual section can only be 63 characters long and a FQDN is limited to 255 characters

8.5. DNS zones

Every DNS server is responsible for a specific DNS zone, e.g., root name servers are responsible for root zones, TLD name servers for zones covering specific TLDs, and authoritative name servers for even finer-grained zones underneath that.

8.6. DHCP

Configuring hosts on a network can be very time consuming enterprise (every node needs an IP address, a subnet mask, a primary gateway, and a name server). To automate these tasks DHCP is used.

DHCP (dynamic host configuration protocol) is an application layer protocol that automates the configuration process for hosts on a network.

There are a few ways that DHCP can operate:

  1. Dynamic allocation

    • most common
    • range of IP addresses is set aside for client devices and issued to them when requested
    • IP address could be different every time device connects to the network
  2. Automatic allocation

    • range of IP addresses is set aside for client devices and issued to them when requested
    • DHCP server keeps track of what IPs were assigned to which devices, and using this information server will assign same IP addresses to the same machines (if possible)
  3. Fixed allocation

    • IP addresses are assigned according to the manually specified list of MAC addresses
    • if the MAC address isn't found, the DHCP server might fall back to automatic or dynamic allocation (or refuse to assign an IP altogether)
    • used as a security measure to ensure that only trusted devices can connect to the network

DHCP can be used for many other things and not only for automatic network configuration: e.g., for assigning NTP (network time protocol) servers, which are used for network time synchronization.

8.7. DHCP discovery

The process by which a client gets network configuration information from a DHCP server is known as DHCP discovery.

  1. DHCP DISCOVER

    • client broadcasts a DHCP discover message from 0.0.0.0:68 to 255:255:255:255:67
    • DHCP server listens on UDP port 67 and catches this message
    • then DHCP server makes a decision on what, if any, IP address to offer to the client (depends on configured allocation)
  2. DHCP OFFER

    • DHCP server broadcasts a DHCP offer message from its actual IP address and source port 67 to 255.255.255.255:68
    • client will recognize this message was intended for itself by its MAC address in the destination field of the Ethernet frame
    • next client would process this offer (it could reject it if, for example, there are multiple DHCP servers on the same network)
  3. DHCP REQUEST

    • client broadcasts a DHCP request message from 0.0.0.0:68 to 255.255.255.255:67 requesting for an IP address assignment
    • DHCP server receives this message
  4. DHCP ACK

    • DHCP server broadcasts a DHCP acknowledgement message from its actual IP address and source port 67 to 255.255.255.255:68
    • again client will recognize this message was intended for itself by its MAC address set as a destination
    • networking stack on the client computer now have all configuration information to set up its own network layer configuration

This configuration process is known as DHCP lease as it includes an expiration time (usually days or shorter). Once a lease has expired, a client would need to negotiate a new lease by performing the entire DHCP discovery process all over again. A client can also release its lease to the DHCP server when disconnecting from the network (this would allow the DHCP server to return client IP address to its pool of available IPs).

8.8. NAT

NAT (network address translation) is technology that allows a gateway to rewrite the source IP of an outgoing IP datagram, while retaining the original IP in order to rewrite it into the response.

  1. Port preservation

    • technique that's used to ensure that incoming traffic goes to the right nodes on a LAN: source ports chosen by clients are the same ports used by the router (so router stores a table of corresponding IP addresses and source ports)
    • when a router performs a NAT on an outgoing packet, it rewrites its source IP address, but leaves the source port number, so when the router gets the response, it will know where to forward it
    • it's possible for two different computers on a LAN to choose the same source port around the same time, so when this happens the router just selects another unused port instead

  2. Port forwarding

    • technique that's used to ensure that specific destination ports will always deliver traffic to specific nodes (allows for complete IP masquerading)
    • port forwarding also simplifies how external users interact with services run by the same organization (e.g., traffic for web server and mail server could be aimed at the same external IP address)

8.9. Limits of IPv4

The IANA has been in charge of distributing IP addresses since 1988. Since that time the Internet has expanded and all 4.2 billion IPv4 addresses have been taken.

IANA has primarily been responsible for assigning address blocks to the five RIRs (regional internet registries): AFRINIC (Africa), ARIN (US, Canada and parts of the Caribbean), APNIC (most of Asia, Australia, New Zealand and Pacific Island nations), LACNIC (Central and South America, parts of the Caribbean), and RIPE (Europe, Russia, Middle East and parts of Central Asia).

IPv6 will eventually resolve the problem of address exhaustion, but implementing IPv6 worldwide is going to take some time. So for now NAT and non-routable address spaces are used as workaround. With NAT you can have thousands of machines use non-routable address space, but have a single public IP, while still sending and receiving traffic from the Internet.

8.10. VPN

Businesses use lots of different technologies to keep their networks secure: firewalls, NAT, non-routable address space, etc. But sometimes employees need to access a network from outside (work from home, business trips, etc.). To achieve this goal VPNs are used.

VPN (virtual private network) is technology that allows for the extension of a private or local network to a remote host that's not on this network.

8.11. Proxy services

Proxy service is a server that acts on behalf of a client in order to access another service.

There are many examples of proxies, but most common ones are:

  1. Web proxy

    • specifically built for web traffic, e.g., commonly used in business to prevent access to certain sites (like social media)

  2. Reverse proxy

    • service that appears as a single server to external clients, but actually represents many servers behind it
    • popular websites use reverse proxies to redirect incoming requests to lots of different physical servers (no single server could handle so much traffic)
    • another use: encryption and decryption are very resource demanding tasks, so reverse proxies use cryptographic hardware to deal with it, so that web servers are free to just serve the content


9. Connecting to the Internet

Technologies that connect devices to the Internet are as different and diverse as these devices themselves.

9.1. Dial-up

For years before Ethernet, TCP or IP were ever invented, there were computer networks made up of technologies focused on connecting devices within close physical proximity to each other. But in 1970s people realized that PSTN (public switched telephone network), or POTS (plain old telephone service), can be used for long distance computer networking. The first system that implemented this idea was Usenet (precursor to the dial-up).

9.2. Broadband connections

Broadband is any connectivity technology that isn't dial-up.

Most common broadband solutions are:

  1. T-Carrier technologies

    • T-Carrier technologies were first invented by AT&T as a system that allowed up to 24 simultaneous phone calls across a single copper cable. Years later this technology was repurposed for data transmission: each of the 24 phone channels was capable of 64 kbps transfer rate, giving a single T1 line cable ability to transmit data at 1.5 Mbps.
    • Originally T1 technology had only been used by telecom companies, but with the rise of the Internet in the 1990s businesses started to use it too. After improvements T3 was invented which allowed for 44.7 Mbps transfer rates (by multiplexing 28 T1 cables acting as a single link).
    • Today T-Carier technologies have been mostly surpassed by other broadband technologies.
  2. DSL (digital subscriber lines)

    • In the early days of the Internet research showed that telephone lines were capable of transmitting way more data than what was needed for voice calls. Just like dial-up, DSL used POTS infrastructure, but in a more effective way by operating at a frequency range that didn't interfere with normal phone calls.

    • DLS connection was able to send much more data than dial-up (1.5 Mbps), and allowed for normal voice phone calls and data transfer to occur at the same time on the same line.

    • Data is transfered trough DSLAMs (DSL access multiplexers), which establish connections across phone lines (but unlike dial-up these connections are long-running, i.e., they aren't torn down until the DSLAM is powered off).

    • Most common types of DSL were:

      1. ADSL (asymmetric DSL)

        • different speeds for outbound and incoming data (faster download speeds and slower upload speeds)
        • were mainly used by home users, since they rarely need to upload as much data as they download
      2. SDSL (symmetric DSL)

        • same upload and download speeds
        • were mainly used by businesses that hosted servers that needed to send data to clients
      3. HDSL (high bit-rate DSL)

        • speeds above 1.5 Mbps
  3. Cable broadband

    • In the 1990s cable TV companies realized that their infrastructure can also be used for computer networking. Just like telephone lines, coaxial cables used for cable TV were capable of transmitting much more data than what was required for TV (by using frequencies that don't interfere with TV broadcast).
    • Unlike other broadband technology, cable is generally a shared bandwidth technology. With technologies like DSL or dial-up the connection from a client goes directly to the CO (central office); this guarantees a certain amount of bandwidth available. On the other hand, with cable broadband many users share a certain amount of bandwidth until the transmissions reach the ISP's core network (this could be anywhere from a single city block to entire subdivisions in the suburbs).
    • Cable connections are usually managed by a cable modem (device that connects a client to the CMTS, or cable modem termination system, which goes to an ISP's core network).

  4. Fiber connections

    • Fiber provides higher speeds and allows transmissions to travel much further without degrading, but producing and laying fiber is a lot more expensive than using copper cables.

    • Instead of a modem, the demarcation point for fiber technologies is an ONT (optical network terminator), which converts data from protocols the fiber network can understand to those that twisted pair copper networks can.

    • FTTX (fiber to the X):

      1. FTTN (fiber to the neighbourhood)

        data is delivered to a single physical cabinet that serves a certain amount of the population (from this cabinet twisted pair copper or coax might be used for the last length of distance)

      2. FTTB (fiber to the building, business, or basement)

        data is delivered to an individual building (after that twisted pair copper is typically used)

      3. FTTH (fiber to the home)

        data is delivered to individual residents

9.3. WAN and point-to-point VPN

Often you might want to connect multiple local networks that are physically separated from one another into one large network (e.g., offices of one company, that are located in different cities). WANs and point-to-point VPNs are used for this purpose.

9.4. Wireless networking

Today fewer and fewer devices are weighed down by physical cables in order to connect to computer networks. Many devices now can use wireless networking.

802.11 data frame:

  1. Control field (16 bits)

    • contains a number of sub-fields that describe how the frame should be processed (i.e., version of the 802.11, etc.)
  2. Duration field (16 bits)

    • total length of a frame
  3. Source and destination MAC addresses (96 bits)

  4. Receiving MAC address (48 bits)

    • MAC address of a wireless access point (device that bridges the wireless and wired portions of a network)
    • wireless network might have lots of different access points, so each device will associate itself with a certain access point (usually the closest one, or depending on a signal strength)
    • associations allow transmissions to wireless devices to be sent by the right access points
    • often the same as the destination MAC address
  5. Sequence control (16 bits)

    • sequence number that's used to keep track of ordering of the frames
  6. Transmitter MAC address (48 bits)

    • MAC address of a device that transmitted the frame
    • often the same as the source MAC address
  7. Payload

  8. Frame check sequence (32 bits)

9.5. Wireless network configurations

There are a few main ways in which a wireless network can be configured:

  1. Ad-hoc network

    • the simplest one: no supporting network infrastructure

    • every device communicates directly with every other device within range, and all nodes help pass along messages

    • some practical applications:

      1. smartphones can establish ad-hoc networks and share data with each other
      2. in industrial or warehouse settings (where individual pieces of equipment might need to communicate with each other but not with anything else)
      3. during disaster situations (when all other infrastructure is absent)

  2. WLAN

    • the most common type of wireless networking
    • consist of wireless and wired networks with access points acting as bridges between them
    • the wired network operates as a normal LAN and contains the outbound internet link

  3. Mesh network

    • hybrid of WLAN and ad-hoc networks
    • lots of the devices communicate with each other wirelessly forming a mesh, but there are also many access points, connected to a wired network

9.6. Wireless channels

Wireless networks don't have cables, so collision domains are inevitable (there is no network switches). Channels help fix this problem to a certain extent.

Channel is an individual, smaller section of the overall frequency band used by a wireless network.

9.7. Wireless security

Wired networking has a certain amount of inherent privacy. That's not true for wireless communications, anyone within a certain range could intercept radio transmissions. That's why encryption is so important for wireless networks.

There are few standard solutions:

9.8. Cellular networks

Another popular form of wireless networking is cellular networking, or mobile networking. In some places cellular networks are the most common way of connecting to the Internet.


10. Troubleshooting

Many of the protocols and network devices have built-in functionalities to help protect against networking failures and errors (e.g., misconfigurations, hardware problems, and system incompatibilities).

Error-detection is ability for a protocol or a program to determine that something went wrong (e.g., CRC).

Error-recovery is ability for a protocol or a program to attempt to fix the issue (e.g., TCP connection establishing).

10.1. ICMP

The inability to establish a connection to something is the most common networking issue. When a network error occurs, the device that detects the issue will communicate it to the source of the problematic traffic using ICMP (internet control message protocol). Frequently occuring errors: router doesn't know how to route to a destination, certain port is unreachable, TTL of an IP datagram expired, etc.

The makeup of an ICMP packet:

  1. Type (8 bits)

    • type of the message that's being delivered (i.e., destination unreachable, time exceeded, etc.)
  2. Code (8 bits)

    • indicates a specific reason for the message (e.g., for destination unreachable type, there are codes for destination network unreachable and destination port unreachable)
  3. Checksum (16 bits)

  4. Rest of header (32 bit; optional)

    • can be used by some of the types and codes to send more data
  5. Payload

    • which transmission generated the error
    • contains the entire IP header and the first eight bytes of the offending packet

10.2. Ping

ICMP is developed for automatic use by networked devices, but sometimes these messages are useful to human operators too. Ping is a specific tool for that (exists in every OS), it lets user send a special type of ICMP message called an echo request. If everything is working correctly, the destination will send back an echo reply. Most basic use: ping <IP or FQDN>.

10.3. Traceroute

Communications across networks go through lots of intermediary nodes, so there is a need for a way to determine where in the chain of router hops the problem occured. Traceroute utility is used for that. It lets user discover the path between two nodes, and gives the information about each hop along the way.

10.4. Testing port connectivity

ICMP and traceroute help to test connectivity between machines at the network layer. To check if things work at the transport layer netcat (Linux and Mac OS) and Test-NetConnection (Windows) utilities are used.

10.5. Name resolution tools

The most common name resolution tool is nslookup (available on all operating systems).

10.6. Public DNS servers

An ISP almost always gives access to a recursive name server as a part of the service it provides. But most businesses also run their own DNS servers (to also resolve names of internal hosts). A third option is to use a DNS as a service provider, and it's getting more and more popular.

In any case it's helpful to have a way to test DNS functionality and also have a backup DNS option. That's where public DNS servers can help (name servers specifically set up by some Internet organization so that anyone can use them for free).

10.7. DNS registration and expiration

Domain names need to be globally unique for a system to work. At the top level it's the responsibility of ICANN. But assignment of domain names to particular organizations and individuals is managed by registrars.

10.8. Host files

Long before DNS was established, it was clear that a language-based system for refering to network devices is needed. Host files were used for that (files that contain tables of network addresses and corresponding host names).


11. The future of networking

11.1. The cloud

Cloud computing is a technological approach where computing resources are provisioned to users in a shareable way. It is based on the concept of hardware virtualization.

11.2. IPv6 addressing

By the mid 1990s, it was clear that the 4.2 billion IPv4 address space will be exhausted at some point. IPv6 was developed to resolve this issue (IPv5 was an experimental protocol that introduced the concept of connections, but it never saw wide adoption, and connection state was handled better later on by the transport layer and TCP).

11.3. IPv6 datagram

IPv6 datagram is an improved version of IPv4 datagram:

  1. Version (4 bits)

    • IPv4 or IPv6
  2. Traffic class (8 bits)

    • allows for different classes of traffic to receive different priorities
  3. Flow label (20 bits)

    • used in conjunction with the traffic class field for routers to make decisions about the quality of service level for a specific datagram
  4. Payload length (16 bits)

  5. Next header (8 bits)

    • defines what kind of header is immediately after the current one (if any)
    • each additional header can also contain nonempty next header field, which allows for a chain of optional headers
  6. Hop limit (8 bits)

    • identical in purpose to the TTL field in an IPv4 header
  7. Source and destination IP addresses (256 bits)

  8. Additional header (optional)

  9. Payload

11.4. IPv6 and IPv4

It's not possible for the entire Internet to switch to IPv6 at once, so smooth transition approach is needed. IPv6 and IPv4 traffic need to coexist with each other during this transition period. Many different technologies, protocols and methods are used for this: